13 research outputs found

    Feature level metrics based on size and similarity in software product line adoption

    Get PDF
    Introducing software product lines is a natural way to cope with a large number of software variants and hard maintenance. This task can become more complicated with a fourth generation language, namely Magic in our case. Feature extraction is an important task of product line adoption, and the extracted features can amount to large proportions of the code and can be hard to contemplate, thus appropriate methods become necessary to ease the handling of the information gained. In this work we present some feature level metrics aiming to highlight valuable information on both the results attained through extraction and the features themselves which can be used in furthering the process of product line adoption. We present some metrics based on size and pairwise similarity of the features of four different variants of the same system. The knowledge of these metrics, properly measured and used can be vital in aiding product line adoption

    Towards JavaScript program repair with Generative Pre-trained Transformer (GPT-2)

    Get PDF
    The goal of Automated Program Repair (APR) is to find a fix to software bugs, without human intervention. The so-called Generate and Validate (G\&V) approach deemed to be the most popular method in the last few years, where the APR tool creates a patch and it is validated against an oracle. Recent years for Natural Language Processing (NLP) were of great interest, with new pre-trained models shattering records on tasks ranging from sentiment analysis to question answering. Usually these deep learning models inspire the APR community as well. These approaches usually require a large dataset on which the model can be trained (or fine-tuned) and evaluated. The criterion to accept a patch depends on the underlying dataset, but usually the generated patch should be exactly the same as the one created by a human developer. As NLP models are more and more capable to form sentences, and the sentences will form coherent paragraphs, the APR tools are also better and better at generating syntactically and semantically correct source code. As the Generative Pre-trained Transformer (GPT) model is now available to everyone thanks to the NLP and AI research community, it can be fine-tuned to specific tasks (not necessarily on natural language). In this work we use the GPT-2 model to generate source code, to the best of our knowledge, the GPT-2 model was not used for Automated Program Repair so far. The model is fine-tuned for a specific task: it has been taught to fix JavaScript bugs automatically. To do so, we trained the model on 16863 JS code snippets, where it could learn the nature of the observed programming language. In our experiments we observed that the GPT-2 model was able to learn how to write syntactically correct source code almost on every attempt, although it failed to learn good bug-fixes in some cases. Nonetheless it was able to generate the correct fixes in most of the cases, resulting in an overall accuracy up to 17.25\%

    FixJS: A Dataset of Bug-fixing JavaScript Commits

    Get PDF
    The field of Automated Program Repair (APR) has received increasing attention in recent years both from the academic world and from leading IT companies. Its main goal is to repair software bugs automatically, thus reducing the cost of development and maintenance significantly. Recent works use state-of-the-art deep learning models to predict correct patches, for these teaching on a large amount of data is inevitable almost in every scenarios. Despite this, readily accessible data on the field is very scarce. To contribute to related research, we present \emph{FixJS}, a dataset containing bug-fixing information of \textasciitilde 2 million commits. The commits were gathered from GitHub and processed locally to have both the buggy (before bug fixing commit) and fixed (after fix) version of the same program. We focused on JavaScript functions, as it is one of the most popular programming language globally and functions are first class objects there. The data includes more than 300,000 samples of such functions, including commit information, before/after states and 3 source code representations

    Exploring Plausible Patches Using Source Code Embeddings in JavaScript

    Get PDF
    Despite the immense popularity of the Automated Program Repair (APR) field, the question of patch validation is still open. Most of the present-day approaches follow the so-called Generate-and-Validate approach, where first a candidate solution is being generated and after validated against an oracle. The latter, however, might not give a reliable result, because of the imperfections in such oracles; one of which is usually the test suite. Although (re-) running the test suite is right under one's nose, in real life applications the problem of over- and underfitting often occurs, resulting in inadequate patches. Efforts that have been made to tackle with this problem include patch filtering, test suite expansion, careful patch producing and many more. Most approaches to date use post-filtering relying either on test execution traces or make use of some similarity concept measured on the generated patches. Our goal is to investigate the nature of these similarity-based approaches. To do so, we trained a Doc2Vec model on an open-source JavaScript project and generated 465 patches for 10 bugs in it. These plausible patches alongside with the developer fix are then ranked based on their similarity to the original program. We analyzed these similarity lists and found that plain document embeddings may lead to misclassification - it fails to capture nuanced code semantics. Nevertheless, in some cases it also provided useful information, thus helping to better understand the area of Automated Program Repair.Comment: Paper accepted in APR2021 conferenc

    Fine-Tuning GPT-2 to Patch Programs, is it Worth it?

    Get PDF
    corecore